JointTracker: Real-time inertial kinematic chain tracking with joint position estimation

In-field human motion capture (HMC) is drawing increasing attention due to the multitude of application areas. Plenty of research is currently invested in camera-based (markerless) HMC, with the advantage of no infrastructure being required on the body, and additional context information being available from the surroundings. However, the inherent drawbacks of camera-based approaches are the limited field of view and occlusions. In contrast, inertial HMC (IHMC) does not suffer from occlusions, thus being a promising approach for capturing human motion outside the laboratory. However, one major challenge of such methods is the necessity of spatial registration. Typically, during a predefined calibration sequence, the orientation and location of each inertial sensor are registered with respect to the underlying skeleton model. This work contributes to calibration-free IHMC, as it proposes a recursive estimator for the simultaneous online estimation of all sensor poses and joint positions of a kinematic chain model like the human skeleton. The full derivation from an optimization objective is provided. The approach can directly be applied to a synchronized data stream from a body-mounted inertial sensor network. Successful evaluations are demonstrated on noisy simulated data from a three-link chain, real lower-body walking data from 25 young, healthy persons, and walking data captured from a humanoid robot. The estimated and derived quantities, global and relative sensor orientations, joint positions, and segment lengths can be exploited for human motion analysis and anthropometric measurements, as well as in the context of hybrid markerless visual-inertial HMC.


Introduction
Human motion capture (HMC) is a well-studied field with multiple application areas like healthcare, sports 1,2 , human-machine interaction 3 , workplace analysis 4,5 , robotics, teleworking, additive manufacturing and human safety 6 .Numerous HMC methods utilize camera-based approaches, including marker-based [7][8][9] or markerless 8,10 multi-view setups, monocular RGB cameras 11 , RGB-D cameras 12 , or event cameras 13 .These approaches capture human pose and motion, mainly employing a third-person perspective, but ego-perspective visual approaches also exist 14 .With the availability of extensive databases containing 2D 15 and 3D 16 human pose information and advancements in feature extraction techniques, such as deep learning 11,17 , monocular HMC has gained significant attention in recent years, both from external 11,[17][18][19][20] or ego-perspective 14 .Many visual mocap approaches rely on joint position detections on the image plane (2D) with a subsequent lifting of the joint positions into 3D space 20 .These approaches do not provide reliable estimates of internal rotations of different body segments, but the pose can be illustrated by connecting the segments at the joints, which is common practice in the vision community 11 .Despite the advancements in camera-based HMC, these approaches face limitations such as occlusions caused by external factors or self-occlusions, which can impact their accuracy.
Inertial HMC (IHMC) uses multiple inertial sensors (IMUs) mounted on body segments, using e.g., straps or clothing integration [21][22][23] .The motion is typically deduced via a suitable state estimation approach in combination with a personalized biomechanical model 24 .IHMC is an approach for in-field HMC that does not suffer from occlusions.However, it comes with other challenges, such as integration drift 25 , sensitivity to magnetic disturbances 26 or calibration issues.The latter are typically related to segment lengths and sensor poses with respect to the anatomical reference frames of the associated body segments, the so-called IMU-to-segment (I2S) calibrations [27][28][29][30] .Some of the mentioned challenges are connected, e.g.omitting magnetometer measurements can lead to orientation drift, while magnetometer usage can violate the often-used constant magnetic field assumption and lead to incorrect state estimates.The I2S calibration is a major concern for assessing and monitoring anatomical joint angles since I2S orientation errors linearly propagate into joint-angle errors 27 .I2S calibration is currently considered the main challenge for IHMC in biomechanical motion analysis 31 , and an automated (self-calibrating) approach can be regarded as crucial for reliable in-field usage of the inertial tracking technology.Self-calibrating IHMC approaches include the direct I2S alignment estimation.Some promising approaches for I2S orientation and position calibration for different body parts are proposed in 28,29,32,33.Direct estimation of biomechanical joint angles without I2S alignment is proposed in 34,35.This work contributes to self-calibrating inertial motion capture of a kinematic model, consisting of joint positions with global IMU poses, as shown in Figure 1 (left).Hence, the focus is on self-calibration of the joint positions in the IMU frames or, in other words, on a calibration-free approach with respect to the I2S orientations and positions.The kinematic model can be used to track joint positions, i.e. joint trajectories, for motion analysis, and it delivers consistent information with visual 3D joint tracking, e.g.20.Moreover, it allows estimating IMU trajectories in one coordinate system with minimal magnetometer usage (usually one measurement sample only for initialization).For the skeleton state estimation and joint parameter identification, we propose a consistent recursive estimation of IMU poses in a global coordinate frame and joint positions in the IMU coordinate frames.
Previous approaches to joint position estimation typically work offline, often also independently per joint, and require high-frequency inertial data; e.g., 32,37,38, though drift-free, typically require a higher sampling rate for obtaining accurate results as compared to approaches working on integrated states.A recent integration-based example is 34, which estimates IMU poses, multiple joint positions, and hinge axes, supported by segment length information, in an offline manner.None of the previous approaches investigated an indicator for convergence of the parameters to be estimated.
The main contributions of this work are: • mathematical derivation of a consistent optimization-based recursive state estimator • online estimation of joint positions for three degrees of freedom (DoF) joints for kinematic chain models • online segment length estimation for all segments connecting two joints • convergence indicator for the motion-dependent joint position parameters • online IMU pose estimation with minimal magnetometer usage (only for initialization).
In the following, Section 2 introduces the mathematical notation and derives the recursive state estimator, including the corresponding models, while also addressing initialization, observability, and identifiability.Section 3 evaluates the method on three different experimental setups: 1) simulated IMU data from a three-link chain fixed in space, 2) real lower-body walking data from a group of young, healthy persons, and 3) real lower-body walking data from a humanoid robot.Section 4 concludes this work and discusses open problems and further research.

Methods
In this section, we present the notation (2.1), the problem statement (2.2), the derivation of the optimization-based recursive state estimator (2.3), and the motion (2.4) and measurement models (2.5) used to estimate the state of the kinematic chain model, illustrated in Figure 1 and Figure 2.

Notation
The following description uses Figure 2 as a reference illustration.The navigation frame N has its z-axis aligned with gravity, and its x-axis pointing towards magnetic north.It is the reference frame to which all IMUs are registered.Each IMU i ∈ I, |I| = N I , has its own coordinate system, I i , which is related to the navigation frame via a pose transformation, e.g., using a homogeneous matrix 3 .1 0 Here, N i I ∈ ℝ 3 denotes the position of the IMU in the navigation frame.The orientation matrix R NI i ∈ S O (3) denotes the IMU rotation relative to the navigation frame.Hence, a vector that is right multiplied to R NI i is transformed from I i to N. The inverse of the homogeneous matrix is ( ) The inverse rotation matrix is the transpose and denoted by (R NI i) T = R I i N .As rotation parametrizations we use Modified Rodriguez Parameters (MRP) χ, unit quaternions q = (q 0 , q 1 , q 2 , q 3 ), where q 0 represents the scalar part, and rotation matrices R. Conversions are indicated with e.g.χ(q), refer to 39 for the equations.The position of a joint connecting two neighbouring IMUs I i and I j is denoted in the navigation frame by , N i j J , (i, j) ∈ J, the set of all joint indices.This joint position is, however, estimated separately in the frames of the two adjacent IMUs as , I i i j J and , j I i j J .For our first test case of a simulated three link chain, one segment is connected to the environment with a fixed point, which is also estimated.The indices for additional (optional) fixed point estimates, fix The symbol X t denotes the collection of all hidden states to be estimated at time step t and X 1: X = denotes a complete time sequence of states consisting of N T time instances.Hence, our kinematic model is defined via IMUs and connecting joints only.This should not be confused with a biomechanical model, where segments S i are defined via anatomical landmarks and are associated with the respective IMUs, typically via a rigid I2S transformation T S i I i.The latter would need an I2S rotation calibration (R S i I i) in addition to the estimates given via the JointTracker.Note that the I2S position can be computed from the I2S rotation and the joint position estimates, via Therefore, the associated biomechanical skeleton could then be directly computed via T NS i = T NI i T I i S i.A lower body skeleton with respective joints and IMUs is illustrated in Figure 1 (left).Figure 1 (right) shows a corresponding biomechanical skeleton with a given I2S alignment.

Problem statement
Synchronized measurements from a network with N I IMUs at time t are denoted , , , .
Here, , y ∈ ℝ 3 is the accelerometer measurement, , I i t y ω ∈ ℝ 3 is the gyroscope measurement, and y ∈ ℝ 3 is the magnetometer measurement, all in the IMU frame I i .We consider a sensor network with |I| > 1 IMUs connecting |J| ≥ 1 joints.Moreover, we assume one IMU being mounted on each segment of the considered kinematic chain.
The aim is to recursively estimate the time dependent state of a kinematic chain of N I IMUs together with the joint and optionally fixed position parameters.For each IMU i, the state includes its position χ , and angular velocity ω , all with respect to the navigation frame N and element of ℝ 3 .Together with the previously mentioned IMU centered joint and fixed position parameters, the state can be written as .
where the specific grouping will be explained later.
Mathematically speaking, we want to obtain a maximum a-posteriori (MAP) state estimate, given the sensor measurements from (3) and statistical models (priors) regarding the IMU motion and the kinematic structure of the skeleton (joint connections).To obtain a formal estimation problem, we utilize Bayes' theorem and the fact that a monotonic function does not change the estimates for extreme points.Thus, the estimation objective can be written as 1: 1: 1: : Note, additional parameter estimates, like sensor biases, often denoted as θ 40 , could be included, but are left out for brevity of notation.Using the Markov property, the prior can be further subdivided into a prior distribution at the first time step and a temporal prior between subsequent time steps, i.e.
Assuming independent measurements and applying the Markov assumption also to the likelihood, the objective becomes The following state splitting is used in this work:
The splitting is motivated by the following discrete state-space model ( ) , where , are modeled as parameters.Since they are linear with respect to the other states in the joint (49) and velocity (50) connection models, their estimation can be viewed as a linear Bayesian regression jointly estimated with the other states.The remaining states in 2 t X are time-dependent but do not have an associated measurement model.Thus, they need to satisfy the proposed motion model strictly to avoid increasing estimation noise.The proposed subdivision is crucial to obtain a consistent and long-term stable recursive estimator (see Section 2.3).Leaving out the constants, the estimation problem with the model Equation (9) to Equation (11) can generally be written as the following constrained non-linear weighted least squares problem where X t ∼ N(X t ; ˆt X , P t ) are state distributions for each time step and 1: ˆT N X is the collection of all these state distributions.

Derivation of optimization-based recursive Bayesian estimator
Since we are interested in online estimation with a low computational cost, a consistent recursive estimator of ( 12) is derived in the following.It estimates the states for the current time step and reduces to the Extended Kalman Filter (EKF) as one special case.For the derivation, we rewrite the general model Equation (9) to Equation (11) as ( ) , with w t ∼ N(0, , where small artificial noise (ε t ) is added to the variables We can now linearize Equation ( 14) with respect to the noise at ε t = 0 to obtain ( ) , for Q converges to the zero matrix for decreasing ε t .Without loss of generality, we assume 2 t Q � to be invertible for all ε t > 0 (otherwise replace ). Analogously to the derivation of the (E)KF in 41, we can write Equation (16) to Equation (18) as linear equation system for time steps t = 1 . . .k, with measurements up to time step m ≤ k, and with the collective state vector in 41).The least squares solution to this equation system is the minimizing solution to the following objective function ) with In (19), J k|m (X 1:k ) describes the objective with motion model residuals up to time step k and measurement residuals up to time step m and thus state estimates up to time step k.
Note that the limes of ε must not be taken in (19), since this would lead to a singular matrix for the residual weighting of the third term.Instead, a recursive estimation form is derived in the following.This allows taking the limes in the final form and naturally leads to a consistent prediction.To obtain a recursive formulation, similarly to 41, we proceed by considering the case k = m, where there are measurements available for each time step.
The prediction step consists of determining the estimate , the latter including the measurements up to time step k − 1.To derive the prediction step, we write the objective (19) in recursive form as Here, J k−1|k−1 (X 1:k−1 ) relates to the prior distribution p( and, exploiting the Markov assumption, could be represented as The gradient of ( 21) is ) with The Hessian of ( 21) is with 21) is a quadratic objective function and ∇ 2 J k|k−1 (X k−1 ,X k ) is positive definite, a single iteration of Newton's method yields the minimum.Thus, the prediction can be written as for any initial guess for . Now, observing that if the estimate of the previous time step was a minimum ) sets all other terms in (24) to zero, the initial guess yields a minimum of (21).Considering this in (26), i.e. that the second term on the right-hand side equals zero, the estimate is this initial guess and the prediction mean reads The prediction covariance can now be computed by evaluating the Hessian (25) at the prediction mean , which gives With the inverse of Schur's complement the inverse of ( 28) can be written as Since the inverse Hessian ( 29) is computed for both time steps, we can extract the covariance for the predicted mean ( 27) from the lower right corner of (29).Therefore, In this form Q k does not need to be inverted and can be singular, thus we can now take the limes ε k → 0 to obtain with Thus, a well defined, consistent and recursive prediction for the model ( 9) and ( 10) is given via the mean (27)  and the covariance (31).With this we have derived the prediction To obtain the posterior distribution we have to include the measurement update for time step k.From the distributional point of view the posterior is proportional to the product of the prediction with the likelihood From the objective function point of view we have to approximate This can be rewritten as where the prediction yields a prior on the states for the measurement update.With the non-linear least squares problem (34), and the exact prediction ( 27) and ( 31), we can derive multiple EKF variants, like the Iterated EKF (via the Gauss-Newton method) or a Levenberg-Marquardt Iterated EKF, depending on the numerical optimization scheme used to minimize this objective.This is, in detail, described in 42 and can be analogously applied here.In this work, we use a Gauss-Newton method with line-search for the measurement models to obtain the estimate The covariance is computed as with P k|k−1 from (31).

Motion models
This section specifies the equations for constructing F t in ( 9) and (10).The IMU motion models assume constant acceleration and constant angular velocity from time t − 1 to t, with sampling time ∆t, hence ( ) 1 , -1 where denote independent normally distributed process noises.The transition models for the joint (i, j) ∈ J and fixed point l ∈ I parameters are fix, fix, 1 .

Measurement models
This section presents the different measurement models used for constructing H t in (11).These comprise models for exploiting the actual sensor measurements, i.e. the IMU data, as well as, models to incorporate assumptions concerning the kinematic chain (connected joints) and its interaction with the environment (fixed position, zero velocity detections).The latter are required for positional drift compensation.In this work, we evaluate two optional measurement models for positional drift compensation: 1) fixed position (assuming that one IMU moves around a fixed point in the navigation frame) is evaluated on the simulated three link chain (see Section 3.2), 2) zero velocity detections are evaluated on the lower body walking data from humans (see Section 3.3) and a humanoid robot (see Section 3.4).Note, for the purpose of positional drift compensation, position or velocity based information needs to be induced into the estimation at least for one IMU, since the joint connection models ((49, 50) below) ensure the propagation of such induced information through the kinematic chain.

IMU data.
Let g N be the gravity vector in the navigation frame, the accelerometer and gyroscope measurement models are 27 ( ) ( ) where denote independent normally distributed measurement noises.To omit heading drift, the magnetometer measurements can be used.The following (implicit) model is constructed to only provide information in the heading direction and omit information about the local dip angle, which is a common way to reduce the influence of magnetic disturbances 43 : where (•) x , (•) y denote the x and y component of the respective vector and denotes normally distributed measurement noises.

Joint connections.
The skeleton structure, i.e. the fact that the segments of the tracked skeleton (kinematic chain) are connected by joints, is also incorporated via two measurement models.The information is operationalized by equating the two IMU-centered position states per joint in the navigation frame, as well as, the joint velocities: where [a×] denotes the cross-product matrix of vector a and denote independent normally distributed measurement noises.

Fixed positions. This model assumes an IMU i moving around a fixed point fix
N P in the navigation frame (e.g. the upper arm IMU moving around the shoulder joint in the simulated three-link chain).The fixed point is then estimated in the respective IMU coordinate system I i as additional IMU-centered joint position state and , t t using the stance hypothesis optimal (SHOE) detector 44 .Note, in case of multiple detections, only the IMU with the lowest SHOE value is kept.The measurement model for a zero velocity detection (m c,t = 1) is where denotes normally distributed measurement noise.Note, in a locomotion scenario, zero velocities are typically regularly detected at the foot IMUs during the stance phases.In locomotion activities with flight phases, such as running or jumping, these events become less frequent, and the estimation can show drift, if there is no other drift compensating mechanism.

State initialization
There exist different approaches for state initialization in the context of IHMC, e.g.initially assumed poses 45,46 or orientation initializations R NI i, i ∈ I, while the other quantities are assumed to be zero 47 .
In this work, we also assume the kinematic chain to be initially stationary, i.e.IMU states ,0 , , 0 0 , , , Moreover, the IMU positions ,0 The IMU orientations 0 NI i χ , ∀i are initialized using the accelerometer and the magnetometer measurements via the triad method 48 as e.g. in 27.Note, this is the only point in time, where magnetometer measurements are required.
In order to support Monte Carlo simulations in the evaluation, the joint positions are initialized randomly with 0 , , and l is the maximal segment length in order to obtain a reasonable distribution (l = 0.4 in the experiments).The fixed positions are treated analogously.

On observability and identifiability of the estimated quantities
Following the notation about structural and practical observability, based on 49, we can identify the following structural non-observabilities.The Gauge freedom 34,50 in initial position 0 , N i I ,∀i and initial velocity 0 , N i I � ,∀i (before zero velocity updates) are correct only up to additive offsets, since no absolute references exist for these estimates.To mitigate this problem, we define prior distributions for the initial position and velocity, with the assumptions stated in Section 2.6.
Regarding practical observability, the joint position estimates need "sufficient" excitation to be observable.Ideally, all degrees of freedom of the relative rotations between the neighboring IMUs have to be excited.This can be analyzed using an approach similar to 38.Alternatively, we propose a scalar joint position uncertainty measure based on the covariance of our state estimator.This can be exploited to indicate sufficient motion excitation for a given joint (e.g. by applying a threshold), or, in other words, as convergence indicator for joint position estimation.Let , , (  ) be the average of the marginalised (3 × 3) covariance matrices of joint (i, j) as estimated in both IMU coordinate systems I i , I j .We then calculate the scaled largest eigenvalue of this matrix with , , , , ( ) , EV where s c ≈ 3.37 corresponds to the square root of the 99%-quantile of the Chi-squared distribution with three degrees of freedom.This approximates the calculation of the 99% credibility region using a scaled spectral norm.
Respective evaluations for different movement scenarios can be found in Section 3.
While, in this work, magnetometers are always used to initialize the IMU orientations via the triad method (see Section 2.6), the magnetometer measurement model ( 48) can be used to obtain observable orientation estimates with respect to the heading direction in the navigation frame.However, if only relative rotations are relevant, and sufficient motion is present to allow for local observability at each joint, as formalized in 51, the magnetometer measurements can be neglected during tracking.According to 51, local non-observability is given, if acceleration and jerk at a joint are co-linear.This only happens in case of no motion or motion along the gravitational vector.In Section 3, we also evaluate the effects of using or omitting magnetometer measurements during tracking.

Results
This section presents the evaluation of the proposed approach with respect to different estimated quantities (IMU orientations, joint positions) and aspects (convergence speed, accuracy, drift behaviour).The evaluation metrics are summarized in Section 3.1.Evaluation is performed on three different motion scenarios: simulated data of a three-link kinematic chain fixed in space (sinusoidal motion in all degrees of freedom) (Section 3.2), real lower body human gait data from 25 healthy persons measured at two points in time (Section 3.3), and gait data captured from a humanoid robot (Section 3.4).The covariance settings used in the proposed recursive Bayesian estimator are listed in Table 1 (in the order they are introduced in Section 2.4 to Section 2.6).

Evaluation metrics
The joint position estimates are evaluated via the segment length errors.Let sl i be the length of segment S i with corresponding IMU I i and connecting joints J h,i and J i, j .The estimated segment length at time t is then and the error can be calculated with respect to the reference sl i,ref as .
The reference is either chosen as the ground truth length of the segment (simulation study, Section 3.2), the estimated segment length at the end of a reference motion (gait study, Section 3.3), or a measured reference (robot, Section 3.4).In the second case, the reliability of the joint position estimates is additionally evaluated exploiting gait data from the same person measured at two different days, with different IMU mountings.In the first case (simulation study), we also evaluate the end-point errors for each estimated joint position in each IMU coordinate system, e.g.
where reference joint position , , is taken from the constructed model.
The estimated relative IMU orientations ,∀(i, j) are evaluated on the angle errors , , ,

27
. Analogously, the global IMU orientations are evaluated in the simulation study.To summarize the errors (deviations from the references) over a complete sequence in one number, the root mean squared error (RMSE) is exploited as where X is the index set of the respective measure, e.g.l = (i, j) ∈ J for joint (i, j).
To evaluate the convergence behaviour of the joint position estimates, we also consider the sample standard deviation (RMSE Std.) of the RMSEs for different initial values obtained via Monte Carlo sampling.
The validity of the relative IMU orientations is furthermore evaluated via the coefficient of determination R 2 52 .Moreover, systematic biases and drift in IMU orientation and segment length estimates are evaluated via the ordinary least product regression (OLP) coefficients 53 .

Covariance Value
3.2 Simulation study on a three-link chain This section presents the evaluation results of the proposed approach using the fixed position measurement model for positional drift compensation (see Section 2.5.3) on simulated (noisy) IMU data.The results regard the identifiability of the joint positions as well as the validity of the IMU orientation estimates.
The motion and IMU data simulation are taken from 27 (Section 2.6.3,sim-fast-artificial sequence).In short, the motion is obtained by assuming sinusoidal angle profiles for all joint degrees of freedom.This satisfies the need for sufficient motion for the identifiability of the joint positions.Based on the above, the IMU data, ground truth IMU poses and joint positions are calculated.

Joint position estimates.
The convergence speed of the joint position estimation is evaluated with N = 10 Monte Carlo samples for the initial joint positions as well as for the noises applied during IMU data simulation.
Comparable results were obtained with and without magnetometer model.
Figure 3 shows the convergences of the joint position estimates.In the figure, the red lines correspond to the end-point errors (56) for each estimated joint position in each IMU coordinate system.The dark grey shaded areas (labelled init uncertainty) correspond to the sample standard deviations over the errors and Monte Carlo samples, and the light green shaded areas (labelled max model uncertainty) correspond to the maximum estimated joint position uncertainties (53) over the Monte Carlo samples.The latter is our currently proposed scalar joint position uncertainty measure to indicate the motion excitation for the respective joint.
It can be observed that the joint position estimates are converged with very low init uncertainty after about 200 samples, which corresponds to about 2 seconds of data.Given the Monte Carlo sampling, this can be considered a robust result for random initializations for the current experimental setup.It is also visible, that the proposed joint position uncertainty measure (max model uncertainty) is a valid convergence indicator in this experiment (the light green shaded areas include the red lines).Comparable results were obtained for the estimation of the fixed position.
Figure 4 shows results concerning the segment length errors as calculated via (55).Note, since the segment length computation requires the joint positions, only two lengths can be calculated.For the third (distal) segment of the three link chain, where no fixation in space is assumed, the segment length calculation is not possible.The final absolute segment length errors (after time step 1256) are very low with 1.1 millimeters for the first and 1.5 millimeters for the second segment.

IMU orientation estimates.
To assess the IMU orientation estimates and the influence of the joint position estimation on these with and without magnetometers, we also investigated the angle errors of the estimated relative and global IMU orientations (57) for the randomly initialized joint positions.The results for the relative IMU orientations are shown in Figure 5, and the results for the global IMU orientations are shown in Figure 6.
The figures show that the IMU orientation estimates, relative or global, with or without magnetometers, are not significantly affected by the joint position estimation.The errors stay mostly below 1°, independently of joint position initialization, magnetometer usage or reference frame.Slightly higher init uncertainties can be observed without the magnetometer model, which is expected due to the lower amount of information fed to the estimator when omitting magnetometer information.Note that, even if not visible in this experimental setting, without magnetometers, the global heading direction is not observable, which might result in drift in the heading direction when processing longer data sequences with other disturbances (besides noise), see e.g.45.This is, however, not the focus of the present work.

Long term estimation.
To assess the long term stability of the proposed approach, we investigated the relative and global IMU orientation estimates over 300 motion cycles (i.e.N = 300 • 629 samples at 100 Hz, which corresponds to 31.45 minutes) with respect to systematic errors.For this, the joint positions and initial IMU poses were initialized correctly.Moreover, the relative IMU orientation estimates are without the magnetometer model, while the global IMU orientation estimates are with the magnetometer model (since the global heading direction is otherwise not observable, as pointed out in the previous section).As introduced in Section 3.1, we computed the OLP coefficients, the R 2 scores and the RMSEs between the estimated orientations and the ground truth orientations, both reduced to single angles (57), considering all samples.The results in Figure 7 show that the proposed approach performs very well in all measures: The red OLP regression lines correlate very well with the blue dots (indicating the individual angle samples).The latter show a very small variance.In more detail, the absolute additive OLP coefficients are all far below 1° indicating only very minor biases, whereas the OLP scaling factors are all 1.0 indicating no systematic error like drift.These results are confirmed by the R 2 scores (all equal to 1.0) and the very low RMSEs all far below 1°.These results demonstrate the consistency and long term stability of the proposed approach for the noisy simulated IMU data in a scenario with sufficient motion excitation.

Human lower body scenario
This section presents the evaluation results of the proposed approach using the zero velocity detections measurement model for positional drift compensation (see Section 2.5.4) on real lower body human gait data, namely the TUK-6-minute-walking dataset 54 , first introduced in 45.Kaiserslautern (TUK) and meets the criteria of the declaration of Helsinki.After receiving all relevant study information, the participants signed an informed consent to the study including a permission to publish the data.

Experimental setup.
In contrast to the previously described simulated motion, there is no exact ground truth data available for the lower body dataset.In particular, the joint position reconstruction from skin markers typically involves approximations, such as regression models 55 .Therefore, we focus our evaluation of the joint position estimates on test-retest reliability of the derived segment lengths (rather than joint position validity).For the IMU orientations, we focus on validity (comparison with the optical data), since the optical marker bodies rigidly attached to the IMUs can be very accurately tracked to deduce the IMU orientations.The subsequent results are all without the magnetometer model.Comparable results were obtained with the magnetometer model.

Joint position estimates.
The convergence speed of the joint position estimation is exemplified in Figure 9 for one test person via the segment lengths derived from the joint position estimates (red lines).The meaning of the shaded areas corresponds to the description in Section 3.2.2.However, note that the red lines indicate the segment lengths rather than the estimation errors and that the max model uncertainty reflects the maximum uncertainty of the two joint positions.
It can be observed that the segment lengths (and hence the joint position estimates) reach steady states after about 300 samples, which corresponds to about 5 seconds of the 60 Hz data.Moreover, the estimated lengths seem feasible for a human body.Also the drops in the proposed joint position uncertainty measure (max model uncertainty) roughly correspond with reaching the steady states (given that the max model uncertainty is associated to only one of the joint positions).However, the red lines keep showing visible swings between frames 200 and 300.A deeper analysis showed that sudden changes in the estimates correlate with the zero velocity detections, which might be inaccurate, but at the same time significantly reduce the model uncertainties.
For quantifying the reliability, we computed statistical differences in segment length estimates after 2000 samples between test and retest data.The assumption is, that the segment lengths (per person) do not change between test and retest, where the same person is measured with the same hardware performing the same walking activity on two different days.However, note that the IMU mountings, which heavily influence the estimated quantities in the local IMU coordinate systems, are different between test and retest, though they were not explicitly varied.The I2S differences (approximated from the optical data) are quantified in Table 2. Table 3 shows the rather small RMSEs (below 2 centimeters) over all persons of the final segment length estimates between test and retest and the sample standard deviations due to Monte Carlo sampling (init uncertainties).
Altogether, it can be concluded that straight walking provides sufficient motion excitation for obtaining reasonable joint position estimates via the proposed approach, though motion excitation is significantly lower in the frontal and transversal plane compared to the sagittal plane.

IMU orientation estimates.
To evaluate the validity of the relative IMU orientation estimates, we computed the OLP coefficients, the R 2 scores and the RMSEs between the estimated orientations and the orientations obtained from the optical data during the test session, as before, both reduced to single angles and considering all samples.The results are shown in Figure 10.Also here, the red OLP regression lines correlate well with the blue dots (indicating the individual angle samples), though the latter show higher variances compared to

Humanoid robot lower body scenario
This section presents the evaluation results of the proposed approach using the zero velocity detections measurement model for positional drift compensation (see Section 2.5.4) on real lower body data captured from a humanoid robot 56 .

Experimental setup.
The same IMU hardware and a comparable IMU setup were used as in the previous experiment.In particular, seven IMUs were attached on the foot, lower leg, upper leg and pelvis segments of the humanoid robot Reem-C from Pal Robotics (Barcelona, Spain).IMU data was collected at 100 Hz.Moreover, the robot motion was synchronously captured with a marker-based optical system (Qualisys AB, Göteborg, Sweden) at 150 Hz.The setup is shown in Figure 11.The captured activity consists of the robot walking in a circle along the edge of the recording volume of the optical motion capture system.The considered IMU data sequence has 40.000 samples corresponding to about 6.67 minutes.
The Reem-C robot is equipped with electrically driven joints.The ankles have two degrees of freedom, the knees are hinge joints (one degree of freedom) and the hips are ball-and-socket joints (three degrees of freedom).
Compared to the previous experiment with humans, this setup has some advantages but also challenges with respect to the evaluation of the proposed approach.Advantages are that the robot has rigid segments which do not induce soft tissue artefacts, that it has precisely known segment lengths and that the mechanical joints allow rotations around fixed and predefined axes.
According to the manual, lower and upper legs have each a length of 0.3 meters, and the pelvis has a width (distance from hip joint center to hip joint center) of 0.15 meters.
Potential challenges are the very low acceleration and velocity of the robot motion, given that the joint position estimation requires sufficient motion excitation for convergence (see Section 2.7), and the hinge joints (knees).In case of a perfect hinge joint, the relative rotation between the two associated segments only takes place around a fixed axis, which results in the fact that the joint position estimates are not identifiable, since each position on the hinge axis represents a valid solution to the models in 2.5.2 32,34,57 .The following results are only without magnetometer model.

Joint position estimates.
The convergences of the segment length estimates are illustrated in Figure 12.The plots show slower convergences as compared to the human walking experiment (cf. Figure 9).Table 4 shows the accuracy of the estimated segment lengths with respect to the above mentioned reference measures, with the RMSEs and standard deviations over the last 100 samples of the motion sequence.The segment length estimate at the pelvis is most accurate with a RMSE of about 1.5 centimeters.This might be due to the higher motion variability at the hip joints.The RMSEs of the other segments range between 1.5 and 3.7 centimeters.Altogether, reasonably stable estimates close to the ground truth measurements are reached for all segments despite the overall slow motion and the hinge joints of the robot.
However, note the occasionally appearing abrupt changes in the segment length estimates in Figure 12, e.g.around frame 1000 at the left upper leg.A deeper analysis showed that these result from false positive zero velocity detections (due to the slow robot motion), where erroneous information is fed into the estimator via the model described in Section 2.5.4.Similar artefacts were observed in the previous experiment, so that is seems desirable to decouple the joint position estimation from the global positional drift reduction mechanism.

IMU orientation estimates.
The accuracy of the relative IMU orientations with respect to the optical reference, reduced to single angles, is shown in Table 5.Altogether, the RMS angle errors are very low (below 2°).

Conclusions and open challenges
This section summarizes and draws conclusions from the evaluation in Section 3 and then outlines limitations, open challenges and future work.

Summary and conclusions
In this work we derived and evaluated an optimization-based recursive Bayesian estimation approach for simultaneous estimation of IMU poses (orientation and position) and joint positions for kinematic chains of different configurations.To the best of our knowledge, this is the first real-time capable approach for simultaneous IMU pose and joint (and fixed point) position estimation.We also propose an indicator for joint or fixed point position estimation convergence (53).We evaluated the proposed approach in different experimental scenarios: simulated IMU data from an arm-like three-link kinematic chain fixed in space as well as human and robotic lower body walking data.
The proposed estimation approach performs excellently on noisy simulated IMU data from a three link chain with a fixed position and with sufficient motion in all degrees of freedom.This concerns the convergence speed (below 200 samples at 100 Hz) and accuracy of the joint positions (final segment length estimation error below 1.5 millimeters) as well as the accuracy of the IMU orientations (all errors below 1°), even without magnetometer usage and during a longer data sequence (over 30 minutes).This demonstrates the overall consistency of the derived estimator.
Also on real IMU data from different locomotion scenarios with zero velocity detections, very good results were obtained for the accuracy of the magnetometer-free relative IMU orientation estimates (RMSE below 2°).Moreover, reasonable results were obtained for the joint position estimates in terms of convergence speed (about 300 samples at 60 Hz for the humans and between 500 and 1000 samples at 100 Hz for the robot), reliability (RMSE of segment length estimates below 2 centimeters between test and retest for the humans) and accuracy (RMSE of segment length estimates below 4 centimeters for the robot).This demonstrates the successful application of the proposed approach on real data and in scenarios with sub-optimal conditions, such as the comparably planar nature of walking movements in general, in particular for the robot with perfect hinges at the knees, in combination with a very slow walking speed.Altogether, the proposed approach is a promising method for real-time IMU pose and joint position estimation without the need for I2S calibration.Hence, in this sense, the approach is calibration-free.Note that a pose based I2S calibration procedure (e.g. as in 58) could of course be used, with the additional advantage that the I2S position can be directly computed from the joint position estimates, as mentioned in Section 2.1.
Casting this in the direction of visual human pose tracking, our approach allows for IMU mounting independent estimation of joint positions in a navigation frame.Hence, it paves the way for the effortless estimation of segment lengths (in physical units), as well as the integration with joint position estimates from computer vision systems.

Limitations, open challenges and future work
As mentioned in Section 2.6, in the present work, a single magnetometer sample per IMU is always used for initializing the IMU orientations in one common reference frame.Hence, magnetometer-free refers to not using the magnetometer model in the tracking (after initialization).A magnetometer-free initialization is therefore an open challenge and part of our future work.First work in this direction can be found in 59.
The above summarized results refer to magnetometer-free relative IMU orientation estimates, since drift-free global heading estimation without magnetometers is not in the focus of this work.Moreover, we did not focus on critical cases such as completely stationary phases or phases with motion only along the gravitational vector.In these cases, relative IMU orientations are also not observable in the proposed approach without using magnetometers 51 .However, an intelligent usage of magnetometer information to address these cases is currently investigated.
Moreover, in the case of insufficient motion excitation for joint position convergence, prior knowledge, e.g. a prior on the segment-length like in 34, could be included, or the joint position uncertainty could be used to guide the person to perform motions that target the uncertain degrees of freedom.
Despite the non-observability of the joint position of a perfect hinge via the proposed approach, as described in Section 3.4.1,reasonable joint position estimates were obtained for both the human and the robot dataset.
At the same time, in particular for the robot with the mechanical hinges at the knees, improved results might be obtained when injecting these constraints via specific hinge measurement models.
The proposed scalar joint position uncertainty measure proved to be a good indicator for joint position convergence in the experiments, in particular in the simulation study.On real data, a scaling calibration might be required as an additional step.However, the approach is not robust against outliers.In particular, imprecise or even false positive zero velocity detections (as observed in the human and robot experiments) prior to convergence decrease the uncertainty and at the same time significantly harm the joint position estimates.
For locomotion scenarios, it is therefore desirable to decouple the joint position estimation from the global position drift reduction mechanism.
Also, since the covariances of the joint position estimates typically decrease with each incoming measurement, stationary phases (e.g. if the person stands still) can lead to overly certain estimates in a specific direction.This can slow down the convergence (even if it would not decrease the proposed convergence indicator).
Finally, the integration of the proposed approach with inside-out or outside-in visual pose reconstruction is part of our future work.

Lauro Armando Contreras Rodríguez
Universidad Autonoma de Sinaloa, Culiacán, Sinaloa, Mexico Overall, the article adequately presents the development of the evaluation of an algorithm for real-time monitoring of the position of joints using inertial sensors.
Taking into consideration the problem posed in the introduction section, the methodological process describes step-by-step the algorithm used to capture the information through the optimization of the derivation of a recursive Bayesian estimator for each IMU sensor to subsequently adapt it to a kinematical model.
The results shown provide a clear explanation of what was obtained for the assessment scenarios, as well as the conclusions show the virtues and drawbacks of the proposal.
However, there are some points that must be addressed: Section 3.1 (Evaluation metrics) provides information that describes how the proposal will be assessed, this information belongs to the methodology section.
○ Figure 12, Table 4, and This paper proposes a method for calibration-free motion capture of kinematic chains using bodyattached IMUs.
Overall, the paper is well written and the motivation clear.However, particularly concerns regarding novelty and contributions should be clarified and alternative, baseline methods should be added to the experimental validation.Specifically, addressing the following issues will help to increase the overall quality of the manuscript:

# Content
A) The novelty of the work is not distinctly outlined.The authors acknowledge the existence of various I2S calibration approaches but do not explicitly compare their method to these predecessors to highlight the specific advancements made.Furthermore, while there are calibration-free IHMC methods available, the authors have not explained why these were not considered suitable for the task at hand.Consequently, it remains uncertain whether this work addresses an unresolved issue or if it offers a solution that outperforms existing methods, which would necessitate a comparative analysis with alternative approaches.
As an example, the authors should add sentences that compare the proposed approach, e.g., "Previous approaches to joint position estimation typically work offline, often also independently per joint, and require high-frequency inertial data; e.g., 32,37,38, though drift-free, typically require a higher sampling rate for obtaining accurate results as compared to approaches working on integrated states."-> [...] In contrast to these methods, our approach does not ... Also directly referring to the cited sentence, it seems that the authors are motivating the contribution by: 1) online-capable, 2) not independently per joint, 3) not requiring high-frequency inertial data.But then they do not properly address these points, regarding -1) The ability of the proposed method to be applicable online (and in real-time?) is not further discussed in the manuscript.
-2) The advantage of the proposed method compared to previous work that uses a independently per joint approach is not shown because the authors then later do not compare the proposed method to alternative, baseline methods -3) This work seems to be using 60Hz+ IMUs which, whilst not being very high-frequency, certainly are also not low-frequency IMUs (like 25Hz).
B) From the problem statement, it is unclear whether or not the IMU-to-segment orientation must be known or if sensors can be attached arbitrarily.
C) The quality and potential of figures is not yet fully utilized.Specifically, the large amount of figures make the results difficult to comprehend.The figures seem to be presented in an exhaustive manner rather than as an highlighting tool.For example, Figure 7 seems highly space inefficient.The compressed information value of the figure is too low to justify its inclusion.A more focused presentation with well chosen figures to highlight exemplary trials and rely on tables for point estimates may be advantageous, or instead of exemplary trials, the authors could plot the average error and average uncertainty over time.
D) It is unclear why the joint position estimates for human trials can not be validated.The authors could test the joint position estimates by placing an additional IMU to a segment with a fixed and known offset.This way the joint position estimate must be different by exactly this known offset value for the two IMUs.
E) There is a lack of alternative, baseline methods.The authors should compare themselves to prior work.For example, for comparing the joint estimation performance, the authors could use [Seel, et al., 2012 (Ref 1)], for comparing orientation estimation performance a simple magnetometer-reliant, joint-independent baseline can be straightforwardly computed for all scenarios.Moreover, by comparing yourself to the independent baselines (in a sense that they are applied for each joint independently) you can then then discuss the difference and advantage to the proposed kinematic-model-based approach.
F) The code for JointTracker should be made openly available.

# Formatting, Typesetting
A) The formatting of figures is suboptimal.
-The figure titles use a rather unattractive underscore notation even though the authors have defined mathematical symbols for these quantities.
-The x-axis should be given in units of second and labeled time for a more comprehensible presentation.The sampling rate can be included in the caption.
-The figures should be embedded into the documents as vector graphics.
-The font sizes, tick sizes, tick labels, titles, labels are all too small and inconsistent with the manuscript font size and font.

# Minor
A) The authors motivate using *human* motion capture but this choice seems confusing.The proposed method is applicable to any kinematic chain, you even shows this applicability to nonhuman systems during experimental validation.Thus, why the restriction to human motion capture in the motivation?
B) The acronym IMU seems incorrectly introduced.
C) Some sentences are overly lengthy and hard to understand, e.g., "Previous approaches to joint position estimation typically work offline, often also independently per joint, and require highfrequency inertial data; e.g., 32,37,38, though drift-free, typically require a higher sampling rate for obtaining accurate results as compared to approaches working on integrated states."Tools like Grammarly can detect such sentences and rewrite them accordingly.D) Some hyperrefs are not working, e.g., "In Section 3, we also evaluate ...".
E) The choice of reporting length scales in meters in this context seems suboptimal.A more natural choice might be centimeters, consider for example Figure 9. e) The fixed positions that you describe in Section 2.5.3 are not clear, especially also not since I didn't see them mentioned anywhere in Section 3. e) Only at the end of section 2.1 you mention "Joint Tracker" but this has not been introduced anywhere.In fact, throughout the paper you refer to your method at Joint Tracker, an iterated EKF and an optimization-based method.I would suggest to stick to one naming.f) You mention I2S calibration multiple times, e.g. at the end of section 2.1 and in Equation (59).It is not clear how this relates to the estimates from your approach as you don't estimate the relative orientation of the sensor with respect to the segments?g) The title of section 2.6 is observability and identifiability but throughout the text you only mention observability.In Section 3.2 you use the wording identifiability though.
3) The relation between your main contribution and the results is also not completely clear: a) It is not clear what you want to show exactly with the experiments.For instance, you test a lot how global / relative orientation estimates depend on including the magnetometer or not.But in the rest of the paper it is not clear that this is an important part of your study.b) To me the legend entries "init uncertainty" and "max model uncertainty" are not clear.With init I expect that this is related to the initialization which I don't think is the case.And why don't you refer to the max model uncertainty as your uncertainty measure?c) Table 2 shows very large rotation errors but I don't think this is discussed in the text?d) 10 Monte Carlo simulations seems very little.e) How does your method handle biases in the inertial sensor signals?f) Why did you correctly initialize the positions and poses in Section 3.2.4? 4) Some comments about the conclusions: a) You conclude that your method can be run real-time, but this is not evaluated in Section 3 (no computational times are included).b) You also draw a conclusion about convergence speed, but this is highly dependent on the movement, right?c) You mention that you obtain reasonable estimates of unobservable variables.Why is this the case and why does constraining unobservable variables lead to improved estimates?5) Also a more general comment: You make an interesting design choice in Section 2 and I think it's worth commenting on the reason for that.You choose to not include process noise on the position and velocity states but then you use a "loose" joint constraint for both the position and the velocity in Section 2.5.2.Why do you not take the limit of those noises to zero as well?And why do you need a constraint on both if both are completely linked through the zero process noise in the time update?6) Some additional comments: a) The formatting of the paper is not intuitive to me, though of course it could be part of the template: Whenever there are numbers in the paper, these are citations.I would expect some brackets around these.Furthermore, e.g. on page 4 (2.1) refers to a section while I would expect this to be a reference to an equation.b) Some parts of equations / matrices, e.g. in Equations ( 2) and (3) are highlighted.It is not clear why.c) Shouldn't it be argmax instead of max in Equations ( 5) and ( 7)? d) Shouldn't the epsilon three lines below (18) be a different symbol than the epsilon of which you just took the limit to zero? e) There's a dot missing on the I in Equation ( 8) in the definition of X^2.
f) The notation of the exp and the quaternion multiplication in (42) have not been introduced.g) Was E_{l,t} in (58) defined?h) In Figure 7 your legend suggests that there are yellow dots in the figure that are barely visible.The red dots, however, are not mentioned in the legend.

Is the rationale for developing the new method (or application) clearly explained? Yes
Is the description of the method technically sound?Yes

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Sensor fusion I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 1 .
Figure 1.The left image is an illustration of the kinematic model with joints (red squares), IMU poses, and connections between IMU and joint centers (blue lines), which is tracked using the proposed approach.The right image shows the corresponding biomechanical skeleton model with I2S calibrations, based on 36, where the joints are emphasized with semi-transparent spheres.

Figure 3 .
Figure 3. Three-link chain: Joint position estimation errors and uncertainties over two motion cycles, i.e. 1256 time steps at 100 Hz.

Figure 4 .
Figure 4. Three-link chain: Segment length estimation errors and uncertainties over two motion cycles, i.e. 1256 time steps at 100 Hz.

Figure 5 .
Figure 5. Three-link chain: Angle errors and init uncertainties of estimated relative IMU orientations with magnetometers (left) and without magnetometers (right) for randomly initialized joint positions.

Figure 6 .
Figure 6.Three-link chain: Angle errors and init uncertainties of estimated global IMU orientations with magnetometers (left) and without magnetometers (right) for randomly initialized joint positions.

Figure 7 .
Figure 7. Three-link chain: OLP coefficients, RMSEs and R 2 scores between the estimated and the ground truth IMU orientations, all reduced to single angles.The upper row refers to relative IMU orientations estimated without magnetometers, and the lower rows refer to global IMU orientations estimated with magnetometers, over 300 motion cycles at 100 Hz, i.e. 31.45 minutes.In the plots, the red regression lines represent the OLP coefficients and the blue dots represent the individual angle pairs (estimated over ground truth).A perfect estimation results in blue dots on the red line.

Figure 8 .
Figure 8. Experimental setup of the TUK-6-minute-walking dataset with IMUs and optical markers.

Figure 9 .
Figure 9. Human lower body: Estimated segment lengths and uncertainties for a selected test person (P1) of the TUK-6-minute-walking dataset.

Figure 7 .
Figure 7.In detail, the absolute additive OLP coefficients are all far below 1° indicating only minor biases, whereas the OLP scaling factors are all close to 1.0 indicating no systematic errors such as drift.These results are confirmed by the R 2 scores (all close to 1.0) and the low RMSEs (all below 1°).

Figure 10 .
Figure 10.Human lower body: Validity of the estimated IMU orientations on the TUK-6-minute-walking dataset (without magnetometer information): estimated joint angles (y-axis) vs. joint angles obtained from the optical data (x-axis).The meaning is analogous to Figure 7.

Figure 11 .
Figure 11.Humanoid robot Reem-C from Pal Robotics (Barcelona, Spain) equipped with optical markers and IMUs.The location of the IMUs are highlighted with blue circles.

Figure 12 .
Figure 12.Humanoid robot: Estimated segment lengths (red lines) and init uncertainties in comparison to the ground truth lengths (green lines).

References 1 .
Seel,T Schauer,T: Joint axis and position estimation from inertial measurement data by exploiting kinematic constraints.2012.45-49 Publisher Full Text Is the rationale for developing the new method (or application) clearly explained?Yes Is the description of the method technically sound?Yes Are sufficient details provided to allow replication of the method development and its use by others?Partly If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Partly Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Partly c) You mention below Equation (11) that the states X_t^2 do not have an associated measurement model.This is not clear at that time in the text since it's not yet known what measurement models you use.d) In Section 2.5 it's not clear that you test your algorithm with and without some measurement models (e.g. the magnetometer) but that some of them you include in all tests.
The associated measurement model is: t v N ∑ tfix denotes normally distributed measurement noise.2.5.4 Zero velocity detections.For all IMUs c ∈ C ᑕ I, zero velocity m c,t = 1, m c,t ∈ {0, 1} is detected, based on a sliding window of inertial measurements : :

Table 5
correspond to the section 3.4, not the Results section.Section 3.3.1 (Experimental setup) requires further information related to the test persons such as: age, height, and weight.The word "young" used to describe the population of this study could lead to interpretation of the use of children, adolescents, and/or young adults.Height and weight help to estimate other movement conditions that are affected by these parameters (the angular velocity is different according to the body segment's size and soft tissues could lead to misalignments of the IMU sensors that modify the interpretation of the body movement).
○ ○Is the

rationale for developing the new method (or application) clearly explained? Yes Is the description of the method technically sound? Yes Are sufficient details provided to allow replication of the method development and its use by others? Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes
Competing Interests: No competing interests were disclosed.Reviewer Expertise: Biomechanical movement; IMU sensor; Sport science.I confirm that I

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Simon Bachhuber FAU
Erlangen-Nuernberg, Nürnberg, Germany